Factorized Asymptotic Bayesian Policy Search for POMDPs
نویسندگان
چکیده
This paper proposes a novel direct policy search (DPS) method with model selection for partially observed Markov decision processes (POMDPs). DPSs have been standard for learning POMDPs due to their computational efficiency and natural ability to maximize total rewards. An important open challenge for the best use of DPS methods is model selection, i.e., determination of the proper dimensionality of hidden states and complexity of policy functions, to mitigate overfitting in highlyflexible model representations of POMDPs. This paper bridges Bayesian inference and reward maximization and derives marginalized weighted loglikelihood (MWL) for POMDPs which takes both advantages of Bayesian model selection and DPS. Then we propose factorized asymptotic Bayesian policy search (FABPS) to explore the model and the policy which maximizes MWL by expanding recently-developed factorized asymptotic Bayesian inference. Experimental results show that FABPS outperforms state-of-the-art model selection methods for POMDPs, with respect both to model selection and to expected total rewards.
منابع مشابه
Efficient Planning in Large POMDPs through Policy Graph Based Factorized Approximations
Partially observable Markov decision processes (POMDPs) are widely used for planning under uncertainty. In many applications, the huge size of the POMDP state space makes straightforward optimization of plans (policies) computationally intractable. To solve this, we introduce an efficient POMDP planning algorithm. Many current methods store the policy partly through a set of “value vectors” whi...
متن کاملFactorized Asymptotic Bayesian Inference for Mixture Modeling
This paper proposes a novel Bayesian approximation inference method for mixture modeling. Our key idea is to factorize marginal log-likelihood using a variational distribution over latent variables. An asymptotic approximation, a factorized information criterion (FIC), is obtained by applying the Laplace method to each of the factorized components. In order to evaluate FIC, we propose factorize...
متن کاملFactorized Asymptotic Bayesian Inference for Latent Feature Models
This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has...
متن کاملFactorized Asymptotic Bayesian Hidden Markov Models
This paper addresses the issue of model selection for hidden Markov models (HMMs). We generalize factorized asymptotic Bayesian inference (FAB), which has been recently developed for model selection on independent hidden variables (i.e., mixture models), for time-dependent hidden variables. As with FAB in mixture models, FAB for HMMs is derived as an iterative lower bound maximization algorithm...
متن کاملSolving POMDPs by Searching in Policy Space
Most algorithms for solving POMDPs itera tively improve a value function that implic itly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that repre sents a policy explicitly as a finite-state con troller and iteratively improves the controller by search in policy space. Two related al gorithms illustrate this approach. ...
متن کامل